An evaluation of conflation accuracy using finite-state transducers

نویسندگان

  • Carmen Galvez
  • Félix de Moya Anegón
چکیده

Purpose – To evaluate the accuracy of conflation methods based on finite-state transducers (FSTs). Design/methodology/approach – Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm. Findings – The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms. Originality/value – The report outlines the potential of transducers in their application to normalization processes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Piezoceramic Element Design and Fabrication for Ultrasonic Transducer of Gas Meter

Ultrasonic transducers play a significant role in generating and receiving the acoustic waves in ultrasonic flowmeters. Depending on the required accuracy, the ultrasonic transducers can be installed either in one pair or more in an ultrasonic flowmeter. The main part of an ultrasonic transducer is its piezoceramic element. In this work, four piezoceramic elements with different diameter to thi...

متن کامل

Word Normalization in Twitter Using Finite-state Transducers

This paper presents a linguistic approach based on weighted-finite state transducers for the lexical normalisation of Spanish Twitter messages. The system developed consists of transducers that are applied to out-of-vocabulary tokens. Transducers implement linguistic models of variation that generate sets of candidates according to a lexicon. A statistical language model is used to obtain the m...

متن کامل

Term conflation methods in information retrieval: Non-linguistic and linguistic approaches

Purpose – To propose a categorization of the different conflation procedures at the two basic approaches, non-linguistic and linguistic techniques, and to justify the application of normalization methods within the framework of linguistic techniques. Design/methodology/approach – Presents a range of term conflation methods, that can be used in information retrieval. The uniterm and multiterm va...

متن کامل

Rational Kernels for Arabic Stemming and Text Classification

In this paper, we address the problems of Arabic Text Classification and stemming using Transducers and Rational Kernels. We introduce a new stemming technique based on the use of Arabic patterns (Pattern Based Stemmer). Patterns are modelled using transducers and stemming is done without depending on any dictionary. Using transducers for stemming, documents are transformed into finite state tr...

متن کامل

Part-of-Speech Tagging Using Parallel Weighted Finite-State Transducers

We use parallel weighted finite-state transducers to implement a part-of-speech tagger, which obtains state-of-the-art accuracy when used to tag the Europarl corpora for Finnish, Swedish and English. Our system consists of a weighted lexicon and a guesser combined with a bigram model factored into two weighted transducers. We use both lemmas and tag sequences in the bigram model, which guarante...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Documentation

دوره 62  شماره 

صفحات  -

تاریخ انتشار 2006